A Near - Optimal Polynomial TimeAlgorithm for Learning in StochasticGames

نویسندگان

  • Ronen I. Brafman
  • Moshe Tennenholtz
چکیده

We present a new algorithm for polynomial time learning of optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh 5] in reinforcement learning and of Monderer and Tennenholtz 7] in repeated games. In stochastic games, the agent must cope with the existence of an adversary whose actions can be arbitrary. In particular, this adversary can withhold information about the game matrix by refraining from (or rarely) performing certain actions. This forces upon us an exploration vs. exploitation dilemma more complex than in Markov decision processes. Namely, given information about particular parts of a game matrix, how much eeort should the agent invest in learning its unknown parts. We explain and address these issues within the class of single controller stochastic games and explain how these ideas can be extended to stochastic games in general.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Near-Minimum-Time Motion Planning of Manipulators along Specified Path

The large amount of computation necessary for obtaining time optimal solution for moving a manipulator on specified path has made it impossible to introduce an on line time optimal control algorithm. Most of this computational burden is due to calculation of switching points. In this paper a learning algorithm is proposed for finding the switching points. The method, which can be used for both ...

متن کامل

Improved teaching–learning-based and JAYA optimization algorithms for solving flexible flow shop scheduling problems

Flexible flow shop (or a hybrid flow shop) scheduling problem is an extension of classical flow shop scheduling problem. In a simple flow shop configuration, a job having ‘g’ operations is performed on ‘g’ operation centres (stages) with each stage having only one machine. If any stage contains more than one machine for providing alternate processing facility, then the problem...

متن کامل

Minimizing Makespan with Start Time Dependent Jobs in a Two Machine Flow Shop

[if gte mso 9]> The purpose of this paper is to consider the problem of scheduling a set of start time-dependent jobs in a two-machine flow shop, in which the actual processing times of jobs increase linearly according to their starting time. The objective of this problem is to minimize the makespan. The problem is known to be NP-hardness[ah1] ; therefore, there is no polynomial-time algorithm...

متن کامل

Optimal Observer Path Planning For Bearings-Only Moving Targets Tracking Using Chebyshev Polynomials

In this paper, an optimization problem for the observer trajectory in the bearings-only surface moving target tracking (BOT) is studied. The BOT depends directly on the observability of the target's position in the target/observer geometry or the optimal observer maneuver. Therefore, the maximum lower band of the Fisher information matrix is opted as an independent criterion of the target estim...

متن کامل

A Near-Optimal Poly-Time Algorithm for Learning a class of Stochastic Games

We present a new algorithm for polynomial time learning of near optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh [ 1998] in reinforcement learning and of Monderer and Tennenholtz [1997] in repeated games. In stochastic games we face an exploration vs. exploitation dilemma more complex than in Markov decision processes....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999